Skip to content

[go/mysql] Avoid static buffer and use tiered pool in #4182

Merged
sougou merged 1 commit intovitessio:masterfrom
LK4D4:avoid_static_buffer
Sep 6, 2018
Merged

[go/mysql] Avoid static buffer and use tiered pool in #4182
sougou merged 1 commit intovitessio:masterfrom
LK4D4:avoid_static_buffer

Conversation

@LK4D4
Copy link
Copy Markdown
Contributor

@LK4D4 LK4D4 commented Sep 5, 2018

Benchmark results:

benchmark                            old ns/op     new ns/op     delta
BenchmarkParallelShortQueries-8      3259          3368          +3.34%
BenchmarkParallelMediumQueries-8     6444          6561          +1.82%
BenchmarkParallelRandomQueries-8     5963212       5441551       -8.75%

benchmark                            old MB/s     new MB/s     speedup
BenchmarkParallelShortQueries-8      3.37         3.27         0.97x
BenchmarkParallelMediumQueries-8     2543.26      2497.78      0.98x

benchmark                            old allocs     new allocs     delta
BenchmarkParallelShortQueries-8      23             23             +0.00%
BenchmarkParallelMediumQueries-8     9              8              -11.11%
BenchmarkParallelRandomQueries-8     14             14             +0.00%

benchmark                            old bytes     new bytes     delta
BenchmarkParallelShortQueries-8      720           812           +12.78%
BenchmarkParallelMediumQueries-8     27205         22720         -16.49%
BenchmarkParallelRandomQueries-8     35898388      29652823      -17.40%

I added tiered pool here because without it performance hit is too big (apparently because we use smaller buffer for readOnePacket or something and then when we read query with very high probability buffer from the pool is that smaller buffer).
cc @danieltahara

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I benchmarked and there is no difference between New and if i == nil way now.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be at least 1 alloc/op due to the interface{} framing. probably irrelevant for the simplification.

Copy link
Copy Markdown
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial comments from eyeballing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return p.pools[len(p.pools)-1] instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why :/ it will return last pool which contains buffers smaller than size.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, even though it shouldn't be reachable...
Maybe panic? :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an expression instead. It will be complex, but efficient (and worth it). This also means that maxSize is not needed.
And write lots of tests to prove it's correct, especially boundary conditions :).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what "expression" means here :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh, I got it. Sorry :) Will do.

@LK4D4
Copy link
Copy Markdown
Contributor Author

LK4D4 commented Sep 5, 2018

@sougou I actually split bucketpool change to #4183 for easier review.

@sougou
Copy link
Copy Markdown
Contributor

sougou commented Sep 5, 2018

I'll clarify my comments in the other PR

Copy link
Copy Markdown

@danieltahara danieltahara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously also pending rebase on the other diff.

would be curious to see the delta between bucket pool and here

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be at least 1 alloc/op due to the interface{} framing. probably irrelevant for the simplification.

go/mysql/conn.go Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if it makes sense to put "finisher" functions as return values to these? That way it prevents people from forgetting to recycle. same with the write side.

go/mysql/conn.go Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need this length check anymore. same with read.

and also BigBuffer can go away (or whatever the last "policy" is)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True for writes.
but for the reads code is different for large buffers - it reads packets one by one instead of ReadAll.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay. feel free to leave as is

@LK4D4
Copy link
Copy Markdown
Contributor Author

LK4D4 commented Sep 5, 2018

So, here is benchmark results comparing to tiered pool PR:

benchmark                            old ns/op     new ns/op     delta
BenchmarkParallelShortQueries-8      3726          3771          +1.21%
BenchmarkParallelMediumQueries-8     6016          7715          +28.24%
BenchmarkParallelRandomQueries-8     5432915       5531186       +1.81%

benchmark                            old MB/s     new MB/s     speedup
BenchmarkParallelShortQueries-8      2.95         2.92         0.99x
BenchmarkParallelMediumQueries-8     2724.08      2124.35      0.78x

benchmark                            old allocs     new allocs     delta
BenchmarkParallelShortQueries-8      23             23             +0.00%
BenchmarkParallelMediumQueries-8     8              8              +0.00%
BenchmarkParallelRandomQueries-8     14             14             +0.00%

benchmark                            old bytes     new bytes     delta
BenchmarkParallelShortQueries-8      720           815           +13.19%
BenchmarkParallelMediumQueries-8     20886         23261         +11.37%
BenchmarkParallelRandomQueries-8     29717958      30225503      +1.71%

And I don't know what could cause this slowdown... Maybe I'll try to revert to readPacketDirect... Though it's not in profile top20.

@sougou
Copy link
Copy Markdown
Contributor

sougou commented Sep 6, 2018

What are the benchmarks compared against?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work correctly if size was 1 and minSize was 1024?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. I see you set idx to 0 next :)

@LK4D4
Copy link
Copy Markdown
Contributor Author

LK4D4 commented Sep 6, 2018

@sougou against #4183
it's actually quite bizarre :(
I found that diff

diff --git a/go/mysql/conn.go b/go/mysql/conn.go
index 2ac69b54a..528bb1daa 100644
--- a/go/mysql/conn.go
+++ b/go/mysql/conn.go
@@ -166,7 +166,7 @@ type Conn struct {
        // Call startEphemeralPacket(length) to get a buffer. If length
        // is smaller or equal than connBufferSize-4, this buffer will be used.
        // Otherwise memory will be allocated for it.
-       buffer []byte
+       //buffer []byte

        // Keep track of how and of the buffer we allocated for an
        // ephemeral packet on the read and write sides.
@@ -193,7 +193,7 @@ func newConn(conn net.Conn) *Conn {
                reader:   bufio.NewReaderSize(conn, connBufferSize),
                writer:   bufio.NewWriterSize(conn, connBufferSize),
                sequence: 0,
-               buffer:   make([]byte, connBufferSize),
+               //buffer:   make([]byte, connBufferSize),
        }
 }

causes performance degradation, especially for medium queries. I have no idea why and kinda losing my mind at this point :)

@sougou
Copy link
Copy Markdown
Contributor

sougou commented Sep 6, 2018

I think the reason why medium buffer is slower is because the size is just above the maxSize. So, it ends up with fresh allocations for every iteration.

I spent some more time digging through the code, and have a few observations. The TL;DR is that we can get rid of all read and write policies.

  • For read policy, MaxPacketSize is 16M. So, big buffers will never be put back into the pool because size exceeds maxSize. This means that we don't need to track policies.
  • We need more buckets. I think we should go up to a maxSize of 15M. Beware of benchmarks, because a random number will skew it towards the huge maxSize. You may have to exponentiate the random numbers, or use different bands.
  • For writes, there is a readability problem: the different code paths are not due to policy. They are actually due to the packet size. So, the code should be changed to explicitly reflect that. This means that we can get rid of write policies.
  • bufio.Writer has a Reset function which allows you to change the writer. This means that you can put it in a sync.Pool.
  • Also, the use of direct is bug prone. It should not be a flag passed into writeEphemeralPacket. Instead, there should be a separate set of functions that cleanly encapsulate the use of bufio with a defer that flushes it (and recycles if we use sync.Pool). This should probably be a separate PR.

@LK4D4
Copy link
Copy Markdown
Contributor Author

LK4D4 commented Sep 6, 2018

Problem is that slowness isn't caused by any changes in code apart from removing buffer allocation from Conn. Literally, if I just allocate unused 16kb []byte - everything is perfect.
I'm not sure I follow buckets problem - maxSize is MaxPacketSize in this PR(which is 16Mb).

@danieltahara
Copy link
Copy Markdown

danieltahara commented Sep 6, 2018 via email

@sougou
Copy link
Copy Markdown
Contributor

sougou commented Sep 6, 2018

I got confused (by looking at the Pool benchmarks). I see that you're now using 16MB.

The buffer thing is weird indeed. We can look at varying the size of the medium query size to see where it jumps. That could give us a clue.

I'm wondering if pprof would help for such short runs.

@LK4D4
Copy link
Copy Markdown
Contributor Author

LK4D4 commented Sep 6, 2018

@sougou I tried twice bigger queries with the same result :(
Btw, adding _ [connBufferSize]byte to struct "fixes" the problem. Apparently some gc tricks again.

Just use sync.Pool always

Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
Copy link
Copy Markdown

@danieltahara danieltahara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing my block. we can deprecate big buffers in a follow up

@sougou sougou merged commit 643873a into vitessio:master Sep 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants